Search CORE

116 research outputs found

Enhancing Large Language Models for Secure Code Generation: A Dataset-driven Study on Vulnerability Mitigation

Author: Cai Yi
Cao Liuwen
Jatowt Adam
Luo Xitong
Wang Jiexin
Xie Jiayuan
Zhou Zhiping
Publication venue
Publication date: 24/10/2023
Field of study

Large language models (LLMs) have brought significant advancements to code generation, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, introduces the risk of inadvertently propagating security vulnerabilities. To effectively mitigate this concern, this paper presents a comprehensive study focused on evaluating and enhancing code LLMs from a software security perspective. We introduce SecuCoGen\footnote{SecuCoGen has been uploaded as supplemental material and will be made publicly available after publication.}, a meticulously curated dataset targeting 21 critical vulnerability types. SecuCoGen comprises 180 samples and serves as the foundation for conducting experiments on three crucial code-related tasks: code generation, code repair and vulnerability classification, with a strong emphasis on security. Our experimental results reveal that existing models often overlook security concerns during code generation, leading to the generation of vulnerable code. To address this, we propose effective approaches to mitigate the security vulnerabilities and enhance the overall robustness of code generated by LLMs. Moreover, our study identifies weaknesses in existing models' ability to repair vulnerable code, even when provided with vulnerability information. Additionally, certain vulnerability types pose challenges for the models, hindering their performance in vulnerability classification. Based on these findings, we believe our study will have a positive impact on the software engineering community, inspiring the development of improved methods for training and utilizing LLMs, thereby leading to safer and more trustworthy model deployment

arXiv.org e-Print Archive

Post-mastectomy radiotherapy can improve survival in breast cancer patients aged 35 years or younger with four or more positive nodes but not in one to three positive nodes

Author: Fengyan Li
Jiayuan Sun
Jiayuan Sun
Juan Zhou
Qin Lin
Qun Li
Sangang Wu
Publication venue: 'Dove Medical Press Ltd.'
Publication date
Field of study

Crossref

The Effect of Long-Term or Repeated Use of Antibiotics in Children and Adolescents on Cognitive Impairment in Middle-Aged and Older Person(s) Adults: A Cohort Study

Author: Chen Xiaoxia
Li Zhichao
Liao Zhimin
Liu Lingying
Liu Zhou
Wang Duolao
Wei Shouchao
Wei Zhuangsheng
Wu Jiayuan
Zhou Haihong
Publication venue: 'Frontiers Media SA'
Publication date: 23/03/2022
Field of study

Objectives: We evaluated the effects of long-term/recurrent use of antibiotics in childhood on developing cognitive impairment in middle and old age from UK Biobank Database. Methods: UK Biobank recruited participants aged 37–73 years. Cognitive impairment was ascertained by fluid intelligence questionnaire. Primary outcome was the occurrence of cognitive impairment in middle and old age. Multivariate logistic regression models were used to explore the relationship between long-term/recurrent use of antibiotics and cognitive impairment. Results: Over 3.8–10.8 years’ follow-up, 4,781 of the 35,921 participants developed cognitive impairment. The odds of cognitive impairment in middle and old age among long-term/recurrent use of antibiotics in childhood were increased by 18% compared with their counterparts (adjusted odd ratio 1.18, 95% confidence interval 1.08–1.29, p < 0.01). The effect of long-term/recurrent use of antibiotics in childhood on cognitive impairment was homogeneous across different categories of various subgroup variables such as sex, age, APOE4, ethnic groups, income before tax, smoking status, alcohol status, BMI, hypertension and diabetes but the effect of long-term/recurrent use of antibiotics in childhood was modified by the educational qualification (p-value for interaction <0.05). Conclusion: Long-term/recurrent use of antibiotics in childhood may increase the risk of cognitive impairment in middle and old age

LSTM Online Archive

PubMed Central

Multi-Granularity Detector for Vulnerability Fixes

Author: Hassan Ahmed E.
Kang Hong Jin
Le Xuan-Bach D.
Le-Cong Thanh
Lo David
Nguyen Truong Giang
Widyasari Ratnadira
Xia Xin
Xu Bowen
Yang Chengran
Zhao Zhipeng
Zhou Jiayuan
Publication venue
Publication date: 23/05/2023
Field of study

With the increasing reliance on Open Source Software, users are exposed to third-party library vulnerabilities. Software Composition Analysis (SCA) tools have been created to alert users of such vulnerabilities. SCA requires the identification of vulnerability-fixing commits. Prior works have proposed methods that can automatically identify such vulnerability-fixing commits. However, identifying such commits is highly challenging, as only a very small minority of commits are vulnerability fixing. Moreover, code changes can be noisy and difficult to analyze. We observe that noise can occur at different levels of detail, making it challenging to detect vulnerability fixes accurately. To address these challenges and boost the effectiveness of prior works, we propose MiDas (Multi-Granularity Detector for Vulnerability Fixes). Unique from prior works, Midas constructs different neural networks for each level of code change granularity, corresponding to commit-level, file-level, hunk-level, and line-level, following their natural organization. It then utilizes an ensemble model that combines all base models to generate the final prediction. This design allows MiDas to better handle the noisy and highly imbalanced nature of vulnerability-fixing commit data. Additionally, to reduce the human effort required to inspect code changes, we have designed an effort-aware adjustment for Midas's outputs based on commit length. The evaluation results demonstrate that MiDas outperforms the current state-of-the-art baseline in terms of AUC by 4.9% and 13.7% on Java and Python-based datasets, respectively. Furthermore, in terms of two effort-aware metrics, EffortCost@L and Popt@L, MiDas also outperforms the state-of-the-art baseline, achieving improvements of up to 28.2% and 15.9% on Java, and 60% and 51.4% on Python, respectively

arXiv.org e-Print Archive

Discriminative analysis of schizophrenia patients using graph convolutional networks: A combined multimodal MRI and connectomics analysis

Author: Fengchun Wu
Fengchun Wu
Guolin Ma
Hehua Li
Hehua Li
Jiayuan Huang
Jing Zhou
Jing Zhou
Jing Zhou
Kai Wu
Kai Wu
Kai Wu
Kai Wu
LiQing Liang
Pengfei Ke
Runlin Peng
Xiaobo Li
Xiaoyi Chen
Yuanyuan Huang
Yuanyuan Huang
Yuping Ning
Yuping Ning
Publication venue: 'Frontiers Media SA'
Publication date: 01/03/2023
Field of study

IntroductionRecent studies in human brain connectomics with multimodal magnetic resonance imaging (MRI) data have widely reported abnormalities in brain structure, function and connectivity associated with schizophrenia (SZ). However, most previous discriminative studies of SZ patients were based on MRI features of brain regions, ignoring the complex relationships within brain networks.MethodsWe applied a graph convolutional network (GCN) to discriminating SZ patients using the features of brain region and connectivity derived from a combined multimodal MRI and connectomics analysis. Structural magnetic resonance imaging (sMRI) and resting-state functional magnetic resonance imaging (rs-fMRI) data were acquired from 140 SZ patients and 205 normal controls. Eighteen types of brain graphs were constructed for each subject using 3 types of node features, 3 types of edge features, and 2 brain atlases. We investigated the performance of 18 brain graphs and used the TopK pooling layers to highlight salient brain regions (nodes in the graph).ResultsThe GCN model, which used functional connectivity as edge features and multimodal features (sMRI + fMRI) of brain regions as node features, obtained the highest average accuracy of 95.8%, and outperformed other existing classification studies in SZ patients. In the explainability analysis, we reported that the top 10 salient brain regions, predominantly distributed in the prefrontal and occipital cortices, were mainly involved in the systems of emotion and visual processing.DiscussionOur findings demonstrated that GCN with a combined multimodal MRI and connectomics analysis can effectively improve the classification of SZ at an individual level, indicating a promising direction for the diagnosis of SZ patients. The code is available at https://github.com/CXY-scut/GCN-SZ.git

Directory of Open Access Journals